Statistical Learning Methods Including Dimensionality Reduction
نویسندگان
چکیده
This special issue ‘Statistical learning methods including dimensionality reduction’ is concerned with situations where the main statistical problem of interest, for example regression, discrimination (supervised classification), and clustering (unsupervised classification), is to be combined with dimension reduction methods. The general objective of this special issue is to collect and present new statistical methodologies and learning methods that use a ‘simultaneous approach’and combine the reduction of objects, variables, and/or dimensionswith statistical problems such as regression and classification. The motivation for this emerging area of research stems from the fact that it has become a standard practice to record large amounts of data (often automatically) and to store them in large databases in the form of high-dimensional multivariate observations. Examples include measurements on tens of thousands of genes in micro-array experiments, continuous andmultiple measurements on human body and brain function (e.g. fMRI and EEG), and automatic marketbasket data accumulation at supermarket checkouts. For the statistician the challenge then consists in extracting and interpreting these large volumes of data in a useful, coherent, and timely way. A major approach proceeds by reducing the dimensionality of the data, the number of objects, variables, situations, etc., in a way that exposes the structure while maintaining the integrity of the data as a whole. On the one hand, this data reduction can be achieved by applying a supervised or unsupervised classification algorithm, or a learning algorithm that yields classes of objects. Similarly, clustering can also be used to reduce the number of variables after defining appropriate proximity measures between variables. On the other hand, and even more frequently, data reduction is obtained by reducing the number of variables or dimensions by factorial techniques such as principal component analysis or regression methods (including, e.g. PLS), or by variable selection methods and to base the data interpretation only the transformed or selected ‘informative’ variables. Two basic approaches are possible for this data reduction process: The first approach proceeds in a sequential way by first applying a data reduction technique to the original data matrix, and then to use the resulting classes or transformed variables to resolve the main problem in a second step (‘tandem approach’). However, the second and often more effective approach consists in combining the main problem and the data reduction problem directly into one single model and to solve both problems simultaneously in the framework of this general model. This special issue focuses on this second strategy where the main statistical problem of interest is combined with data and dimension reduction tools. It collects and presents new statistical tools and learning strategies for this ‘simultaneous’ approach. Thereby we concentrate on three types of main problems: regression, discrimination or classification, and clustering. 1. First, we consider dimensionality reduction in regression. A major tool for discarding ‘uninteresting’ regressors is provided by the LASSO technique of Tibshirani (1996) that leads to ‘regular’ LASSO estimators. In this issue, Meinshausen (2007) proposes a new regularization method, called relaxed LASSO, that includes both soft and hard thresholding of estimators. The relaxed LASSO solutions include all regular LASSO solutions and are not affected by the presence of a large number of noise predictors. Relaxed Lasso produces both sparser (in terms of the number of included variables) and better (in terms of squared error loss) estimators than the regular LASSO estimator. This paper can be seen in the context of a LASSO approach by Trendafilov and Jolliffe (2006) that uses a penalty function for the LASSO constraint. Ridge-type estimators provide another framework for data reduction. This approach is followed by Takane and Hwang (2007) who consider redundancy analysis methods and incorporate a ridge-type regularization
منابع مشابه
An Information Geometric Framework for Dimensionality Reduction
This report concerns the problem of dimensionality reduction through information geometric methods on statistical manifolds. While there has been considerable work recently presented regarding dimensionality reduction for the purposes of learning tasks such as classification, clustering, and visualization, these methods have focused primarily on Riemannian manifolds in Euclidean space. While su...
متن کاملOrganization: Ga Tech Res Corp -git Title: Computational Methods for Nonlinear Dimension Reduction
Research and Education Activities: In this project we focus on computational and statistical methods for nonlinear dimension reduction problems. Our emphasis is placed on numerical stability analysis as well as statistical consistency of manifold learning algorithms, visualization problems involving nonlinear dimension reduction, and applications of nonlinear dimension methods in
متن کاملStatistical learning for effective visual information retrieval
For effective retrieval of visual information, statistical learning plays a pivotal role. Statistical learning in such a context faces at least two major mathematical challenges: scarcity of training data, and imbalance of training classes. We present these challenges and outline our methods for addressing them: active learning, recursive subspace co-training, adaptive dimensionality reduction,...
متن کاملA roadmap to multifactor dimensionality reduction methods
Complex diseases are defined to be determined by multiple genetic and environmental factors alone as well as in interactions. To analyze interactions in genetic data, many statistical methods have been suggested, with most of them relying on statistical regression models. Given the known limitations of classical methods, approaches from the machine-learning community have also become attractive...
متن کاملCovariance Operator Based Dimensionality Reduction with Extension to Semi-Supervised Settings
We consider the task of dimensionality reduction for regression (DRR) informed by realvalued multivariate labels. The problem is often treated as a regression task where the goal is to find a low dimensional representation of the input data that preserves the statistical correlation with the targets. Recently, Covariance Operator Inverse Regression (COIR) was proposed as an effective solution t...
متن کاملManifold Learning Algorithms and Their Mathematical Foundations
This is the final project report for CPS2341. In this paper, we study several recently developed manifold learning algorithms or more specifically: Isomap, Laplacian Eigenmap and Diffusion Map. We motivate the importance of these algorithms through real applications such as dimensionality reduction, clustering and classification. We also give the basic mathematical justifications for these meth...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 52 شماره
صفحات -
تاریخ انتشار 2007